Parallel execution: Determining SQL suitability, benefits
When deciding whether SQL is suitable for parallel execution, consider the factors listed in this chapter excerpt, such as the execution time needed for SQL statements.
Solution provider's takeaway: Parallel execution in your customer's SQL environment requires you to know if SQL is prepared for execution. After deciding on suitability, you need to ensure that you have a sound implementation plan and know how to optimize parallel performance.
Optimizing Parallel performance
Now that we have a solid grounding in the theory of parallel execution, and understand how to influence and measure parallel execution, we are in a good position to formulate some guidelines for optimizing parallel execution. Here are the guidelines for getting the most out of parallel execution:
- Start with a SQL that is optimized for serial execution.
- Ensure that the SQL is a suitable SQL for parallel execution.
- Ensure that the database server host is suitably configured for parallel execution.
- Make sure that all parts of the execution plan are parallelized.
- Ensure that the requested DOP is realistic.
- Monitor the actual versus requested DOP.
- Check for skew in data and skew in workload between processes. Let's now look at each of these guidelines in detail.
Start with a SQL that is optimized for serial execution
An optimal parallel plan might be different from an optimized serial plan. For instance, parallel processing usually starts with a table or index scan, whereas the optimal serial plan might be based on an index lookup. However, you should ensure that your query is optimized for serial execution before parallelizing for these reasons:
- The structures and methods of serial tuning—indexing, statistics collections, and such—are often essential for good parallel tuning as well.
- If the resources required for parallel execution are not available, your query might be serialized (depending on the settings of PARALLEL_DEGREE_ POLICY and PARALLEL_MIN_PERCENT). In that case, you want to ensure than your parallel query's serial plan is as good as possible.
- A SQL that is poorly tuned might become an even worse SQL—at least in terms of its impact on other users—when it is permitted to consume more of the database server's CPU and IO resources.
When optimizing a SQL statement for parallel execution, start by optimizing the SQL for serial execution.
Ensure that the SQL is a suitable SQL for parallel execution
Not every SQL can benefit from parallel execution. Here are a few examples of SQLs that probably should not be parallelized:
- SQL statements that have a short execution time when executed serially.
- SQL statements likely to be run at high rates of concurrency in multiple sessions.
- SQL statements based on index lookups. Nonpartitioned index lookups or range scans cannot be parallelized. Index full scans can be parallelized, however. Index lookups on partitioned indexes can also be parallelized.
Make sure that the SQL to be parallelized is suitable for parallel execution; OLTP type queries are generally not suitable for parallelization.
Ensure that the system is suitably configured for parallel execution
Not all SQLs are suitable for parallel execution, and not all database server hosts are suitable either. In today's world, most physical server hosts will meet the minimum requirements: multiple CPUs and data striped across multiple physical drives. However, some virtual machine hosts might fail to meet those minimum requirements and desktop machines, which typically have only a single disk device, are usually not optimized for parallel execution.
Don't try to use parallel execution on systems that do not meet the minimum requirements (multiple CPUs and data striped across multiple drives).
Make sure that all parts of the execution plan are parallelized
In a complex parallel SQL statement, it's important to ensure that all significant steps in the query execution are implemented in parallel. If one of the steps in a complex query is performed in serial, the other parallel steps might have to wait for the serial step to complete, and the advantages of parallelism will be lost. The OTHER_TAG column of the PLAN_TABLE indicates such a step with the PARALLEL_FROM_SERIAL tag and DBMS_XPLAN record S->P in the IN-OUT column.
For instance, in the following example the CUSTOMERS table is parallelized, but the SALES table is not. The join and GROUP BY of the two tables includes many parallelized operations, but the full table scan of SALES is not parallelized and the tell-tale S->P tag shows that SALES rows are fed in serial into subsequent parallel operations:
SQL> ALTER TABLE customers PARALLEL(DEGREE 4);
SQL> ALTER TABLE sales NOPARALLEL ;
SQL> EXPLAIN PLAN FOR
2 SELECT /*+ ordered use_hash(c) */
3 cust_last_name, SUM (amount_sold)
4 FROM sales s JOIN customers c
5 USING (cust_id)
6 GROUP BY cust_last_name;
SQL> SELECT * FROM table (DBMS_XPLAN.display
(NULL, NULL, 'BASIC +PARALLEL'));
|
|
|
|
|
|
0 | SELECT STATEMENT | ||||
1 | PX COORDINATOR | ||||
2 | PX SEND QC (RANDOM) | :TQ10002 | Q1,02 | P->S | QC (RAND) |
3 | HASH GROUP BY | Q1,02 | PCWP | ||
4 | PX RECEIVE | Q1,02 | PCWP | ||
5 | PX SEND HASH | :TQ10001 | Q1,01 | P->P | HASH |
6 | HASH GROUP BY | Q1,01 | PCWP | ||
7 | HASH JOIN | Q1,01 | PCWP | ||
8 | BUFFER SORT | Q1,01 | PCWC | ||
9 | PX RECEIVE | Q1,01 | PCWP | ||
10 | PX SEND BROADCAST | :TQ10000 | S->P | BROADCAST | |
11 | VIEW | VW_GBC_5 | |||
12 | HASH GROUP BY | ||||
13 | TABLE ACCESS FULL | SALES | |||
14 | PX BLOCK ITERATOR | Q1,01 | PCWC | ||
15 | TABLE ACCESS FULL | CUSTOMERS | Q1,01 | PCWP |
A partially parallelized execution plan, such as the preceding one, can deliver the worst of both worlds: Elapsed time is not improved because the serial operation forms a bottleneck on overall execution. Nevertheless, the SQL ties up parallel server processes and might impact the performance of other concurrently executing SQL.
If we set a default degree of parallelism for the SALES table, the serial bottleneck disappears. The full scan of SALES is now performed in parallel, and the S->P bottleneck is replaced by the fully parallelized P->P operation:
|
|
|
|
|
0 | SELECT STATEMENT | |||
1 | PX COORDINATOR | |||
2 | PX SEND QC (RANDOM) | :TQ10003 | Q1,03 | P->S |
3 | HASH GROUP BY | Q1,03 | PCWP | |
4 | PX RECEIVE | Q1,03 | PCWP | |
5 | PX SEND HASH | :TQ10002 | Q1,02 | P->P |
6 | HASH GROUP BY | Q1,02 | PCWP | |
7 | HASH JOIN | Q1,02 | PCWP | |
8 | PX RECEIVE | Q1,02 | PCWP | |
9 | PX SEND BROADCAST | :TQ10001 | P->P | |
10 | VIEW | VW_GBC_5 | Q1,01 | PCWP |
11 | HASH GROUP BY | Q1,01 | PCWP | |
12 | PX RECEIVE | Q1,01 | PCWP | |
13 | PX SEND HASH | :TQ10000 | Q1,00 | P->P |
14 | HASH GROUP BY | Q1,00 | PCWP | |
15 | PX BLOCK ITERATOR | Q1,00 | PCWC | |
16 | TABLE ACCESS FULL | SALES | Q1,00 | PCWP |
17 | PX BLOCK ITERATOR | Q1,02 | PCWC | |
18 | TABLE ACCESS FULL | CUSTOMERS | Q1,02 | PCWP |
When optimizing a parallelized execution plan, ensure that all relevant steps are executed in parallel: The S->P tag in DBMS_XPLAN or PARALLEL_FROM_SERIAL in the PLAN_TABLE often indicates a serial bottleneck in an otherwise parallel plan.
Ensure that the requested DOP is realistic
We saw previously (in Figure 13-5, for instance), how increasing the DOP beyond the optimal level can place excessive load on the system without improving performance. In worst case scenarios, increasing the DOP beyond optimal can result in a reduction in query elapsed time as well. Therefore, setting an appropriate DOP is important both for the health of the database as a whole, and for the optimal performance of the query being parallelized.
Ensure that your requested or expected DOP is realistic; an overly-high DOP can result in excessive load on the database server without improving the SQL's performance.
Monitor the actual DOP
Your requested DOP might be optimal but not always achievable. When multiple parallelized queries contend for finite parallel execution resources, the DOP might be reduced, or the SQL statement might be run in serial mode.
We previously discussed how Oracle decides on the actual DOP; most importantly the parameters PARALLEL_MIN_PERCENT, PARALLEL_DEGREE_ POLICY, and PARALLEL_ADAPTIVE_MULTI_USER control how Oracle changes the DOP and whether a statement runs at reduced parallelism, terminates with error, or is deferred for later processing when insufficient resources exist to run the statement at the requested DOP.
Reductions in the DOP can result in disappointing performance for your parallel SQL. You should monitor query execution to see if such reductions in the DOP are actually occurring. We previously saw how we can use V$PQ_TQSTAT to measure the actual DOP and how we can use statistics in V$SYSTAT to measure parallel downgrades overall.
If you determine that downgraded parallelism is leading to disappointing performance, you might want to revisit your system resources (memory, IO bandwidth), scheduling of parallel SQLs, or revisit your server configuration. Possible options include
- Rescheduling parallel SQLs so that they do not attempt to run concurrently. Oracle 11g Release 2 can automatically reschedule SQLs if the PARALLEL_ DEGREE_POLICY is set to AUTO.
- Adjusting parallel configuration parameters to allow greater concurrent parallelism. You can do this by increasing PARALLEL_THREADS_PER_ CPU or PARALLEL_MAX_SERVERS. The risk here is that the amount of parallel execution will be greater than your system can support, leading to degraded SQL performance.
- Increasing the power of your database server. You can increase the number of CPUs, the number of instances in a RAC cluster, and the number of disks in your disk array.
- Adjust PARALLEL_MIN_PERCENT to enable SQLs to run at reduced parallelism rather than signalling an error.
Disappointing parallel performance might be the result of Oracle downgrading the requested DOP due to concurrent load or limits on parallel execution resources.
Check for skew in data and skew in workload between processes
Parallel processing works best when every parallel process in a step has the same amount of work to do. If one slave process has more work than a peer process, the "lazy" slave will wait for the "busy" slave, and we won't get performance improvements in line with the number of processes working on the SQL.
Most of the algorithms that Oracle employs are designed to achieve an even distribution of data; these algorithms include the HASH, ROUND ROBIN, and RANDOM distribution mechanisms. However, when a sort operation is performed, Oracle cannot use these random or pseudo-random mechanisms. Instead, Oracle must distribute data to the slaves based on the sort key columns. We saw an example of this in Figure 13-2 where a parallel process fed rows from A–K to one slave for sorting and rows from L–Z to the other.
If the distribution of data in the sort column is very skewed, this allocation might be uneven. For instance, consider this simple query:
SQL> EXPLAIN PLAN 2 FOR 3 SELECT /*+ parallel */ 4 cust_last_name, cust_first_name, cust_year_of_birth 5 FROM customers 6 ORDER BY CUST_LAST_NAME;
|
|
|
|
|
|
0 | SELECT STATEMENT | ||||
1 | PX COORDINATOR | ||||
2 | PX SEND QC (ORDER) | :TQ10001 | Q1,01 | P->S | QC (ORDER) |
3 | SORT ORDER BY | Q1,01 | PCWP | ||
4 | PX RECEIVE | Q1,01 | PCWP | ||
5 | PX SEND RANGE | :TQ10000 | Q1,00 | P->P | RANGE |
6 | PX BLOCK ITERATOR | Q1,00 | PCWC | ||
7 | TABLE ACCESS FULL | CUSTOMERS | Q1,00 | PCWP |
In the preceding step 5, Oracle distributes data from one set of slaves to another based on the range of values contained in the sort column. If the data is well distributed, all should be well. However, should the data be heavily skewed (perhaps we have an extra large number of Smiths and Zhangs), the distribution of data to slaves might become uneven. For example, the following V$PQ_TQSTAT output shows such an uneven distribution with twice as many rows directed to one slave than the other (I deliberately skewed customer surnames to achieve this):
SQL> SELECT dfo_number, tq_id, server_Type, MIN (num_rows), 2 MAX (num_rows), COUNT ( * ) dop 3 FROM v$pq_tqstat 4 GROUP BY dfo_number, tq_id, server_Type 5 ORDER BY dfo_number, tq_id, server_type DESC;
|
|
|
|
|
|
1 | 0 | Ranger | 182 | 182 | 1 |
1 | 0 | Producer | 158968 | 174512 | 2 |
1 | 0 | Consumer | 103262 | 230218 | 2 |
1 | 0 | 1 Producer | 103262 | 230218 | 2 |
1 | 1 | 1 Consumer | 333480 | 333480 | 1 |
Unfortunately, there might be little that can be done about such a data skew. Oracle does not appear to take histogram data into account when distributing rows between parallel slaves. If the distribution of rows seems particularly uneven, you can consider changing the DOP or reviewing whether the SQL is truly suitable for parallelizing.
Effective parallelism depends on the even distribution of processing across the parallel slave processes. V$PQ_TQSTAT enables you to evaluate the efficiency of the load balancing across the parallel slaves.
Parallel SQL
Using parallel SQL to improve Oracle database performance
Parallel processing: Using parallel SQL effectively
Parallel execution: Determining SQL suitability, benefits
Printed with permission from Prentice Hall Inc. Oracle Performance Survival Guide: A Systematic Approach to Database Optimization by Guy Harrison. For more information about this title and other similar books, please visit http://www.prenticehall.com.